perf(engine): return sorted data from compute_trie_input #19340

yongkangc · 2025-10-28T04:15:25Z

Closes #19249

Eliminates sorting overhead per block by returning TrieInputSorted instead of unsorted TrieInput from compute_trie_input, and have ExecutedBlock utilise the TrieInputSorted

Previously, MultiProofConfig::from_input() would call drain_into_sorted() on both nodes and state every block, performing expensive sorting operations:

This eliminates 2-5ms of sorting overhead per block by returning TrieInputSorted instead of unsorted TrieInput from compute_trie_input. Previously, MultiProofConfig::from_input would call drain_into_sorted() on both nodes and state, performing expensive sorting operations every block. Now compute_trie_input sorts once at the end and returns sorted data, making MultiProofConfig::from_input a simple Arc wrapper. Changes: - Add TrieInputSorted type with sorted TrieUpdates and HashedPostState - Add clear() methods to TrieUpdatesSorted and HashedPostStateSorted - Update compute_trie_input to return (TrieInputSorted, BlockNumber) - Update MultiProofConfig::from_input to accept TrieInputSorted - Update BasicEngineValidator to store Option<TrieInputSorted> The implementation uses a "build unsorted, sort once" strategy: unsorted HashMap-based structures are used during building for fast extend operations, then sorted once before returning. This eliminates redundant sorting while maintaining performance. Resolves: #19249

Updated the handling of trie input in the compute_trie_input function to improve performance and memory efficiency. The changes include: - Replaced the use of Option<TrieInputSorted> with Option<TrieInput> to allow for better reuse of allocated capacity. - Introduced a new method, drain_into_sorted, in TrieInput to convert it into TrieInputSorted while retaining HashMap capacity for subsequent operations. - Adjusted the logic in compute_trie_input to utilize the new method, reducing unnecessary allocations and improving performance during block validations. These modifications streamline the trie input processing, enhancing overall efficiency in the engine's validation workflow.

crates/trie/common/src/input.rs

crates/engine/tree/src/tree/payload_validator.rs

crates/trie/common/src/input.rs

Replaced the existing HashedPostState and TrieUpdates with their sorted counterparts, HashedPostStateSorted and TrieUpdatesSorted, in the ExecutedBlock struct. This change enhances the efficiency of state handling by ensuring that the trie updates and hashed state are maintained in a sorted order, improving performance during block execution and validation.

- The previous approach: - Converts every sorted block back into hash maps (cloning all keys/values once per block) because extend_with_blocks works on the unsorted representation. - After all that, drain_into_sorted() iterates those hash maps, builds sorted Vecs, and drains the allocations—more cloning and shuffling before we return to the same sorted layout we could have maintained from the start. - So the new loop cuts out the conversion overhead and reduces allocations; the old code was strictly more work for the same end result.

Your changes improve performance by: 1. **Avoiding Costly Allocations:** Instead of creating and merging many temporary `HashMap`s, the code now builds the final `HashMap` directly from sorted lists (`Vec`s). This drastically reduces memory allocation overhead. 2. **Faster CPU Operations:** Iterating over a `Vec` is more cache-friendly and faster for the CPU than iterating over a `HashMap`. Additionally, you removed a redundant lookup (`.remove()`) from a critical loop, saving extra CPU cycles.

crates/engine/tree/src/tree/payload_validator.rs

crates/trie/common/src/input.rs

- Since TrieInputSorted already has the Arc-wrapped nodes and state, we can use them directly without creating an intermediate wrapper struct. After eliminating both call sites, MultiProofConfig became dead code, so we removed it.

- it is the sorted-data analogue of the old HashedStorage::extend method

…ations

…r appending sorted updates and constructing prefix sets, improving efficiency and clarity in state management.

…thods for appending sorted updates and constructing prefix sets, improving efficiency and clarity in state management." This reverts commit 0e0e2d7.

mediocregopher

Looking good! I think the biggest open question is what to do about ExecutedBlock, which seems to get used in a lot of places that maybe we don't want this change to touch.

mediocregopher · 2025-10-30T15:27:48Z

crates/engine/tree/src/tree/mod.rs

            execution_output: Arc::new(execution_output),
-            hashed_state: Arc::new(hashed_state),
-            trie_updates: Arc::new(trie_updates.into()),
+            hashed_state: Arc::new(hashed_state.into_sorted()),


Even if we still have an into_sorted here, it's better that it's hashed state because that's usually an order of magnitude smaller than trie updates 👍

crates/engine/tree/src/tree/payload_validator.rs

mediocregopher · 2025-10-30T15:34:05Z

crates/engine/tree/src/tree/payload_validator.rs

+            input.state = Arc::clone(&first.hashed_state);
+            input.nodes = Arc::clone(&first.trie_updates);
+
+            // Only clone and mutate if there are multiple in-memory blocks.


mediocregopher · 2025-10-30T15:42:52Z

crates/optimism/payload/src/builder.rs

            execution_output: Arc::new(execution_outcome),
-            hashed_state: Arc::new(hashed_state),
-            trie_updates: Arc::new(trie_updates),
+            hashed_state: Arc::new(hashed_state.into_sorted()),


This is a complication that I hadn't thought about... that we might be making block building slower by forcing the builder to do the sorting itself, rather than the validator. There's a few of possibilities here:

Use an Either for hashed_state/trie_updates fields on ExecutedBlock, so they can be sorted or not.

Use a different ExecutedBlock struct for block building, which doesn't seem wrong but might be a big change

Use an ExecutedBlockExt kind of struct in the engine tree, which might look like:

struct ExecutedBlockExt { executed_block: ExecutedBlock, // retains unsorted updates/state trie_updates_sorted: TrieUpdatesSorted, hashed_state_sorted: HashedPostStateSorted, }

...and leave other places untouched

cc @mattsse

hmm yeah, would like to get @mattsse thoughts on what might be the best way to deal with this re block building and ExecutedBlock abstraction

mediocregopher · 2025-10-30T15:43:56Z

crates/storage/provider/src/providers/database/provider.rs


            // insert hashes and intermediate merkle nodes
-            self.write_hashed_state(&Arc::unwrap_or_clone(hashed_state).into_sorted())?;
+            self.write_hashed_state(&hashed_state)?;


🔥 🔥 🔥

crates/trie/common/src/input.rs

yongkangc added C-perf A change motivated by improving speed, memory usage or disk footprint A-engine Related to the engine implementation labels Oct 28, 2025

yongkangc requested a review from Rjected as a code owner October 28, 2025 04:15

github-project-automation bot added this to Reth Tracker Oct 28, 2025

yongkangc requested review from fgimenez, mattsse, mediocregopher and shekhirin as code owners October 28, 2025 04:15

github-project-automation bot moved this to Backlog in Reth Tracker Oct 28, 2025

yongkangc marked this pull request as draft October 28, 2025 04:21

yongkangc self-assigned this Oct 28, 2025

yongkangc moved this from Backlog to In Progress in Reth Tracker Oct 28, 2025

yongkangc added 3 commits October 28, 2025 15:41

update call site

3fd4996

fix fmt

9b83edd

yongkangc added the A-trie Related to Merkle Patricia Trie implementation label Oct 29, 2025